Feature Selection with Attributes Clustering by Maximal Information Coefficient
نویسندگان
چکیده
Feature selection is usually a separate procedure which can not benefit from result of the data exploration. In this paper, we propose a unsupervised feature selection method which could reuse a specific data exploration result. Furthermore, our algorithm follows the idea of clustering attributes and combines two state-of-the-art data analyzing methods, that’s maximal information coefficient and affinity propagation. Classification problems with different classifiers were tested to validation our method and others. Data experiments result exhibits our unsupervised algorithm is comparable with classical feature selection methods and even outperforms some supervised learning algorithms. Data simulation with one credit dataset of our own from a bank of China shows the capability of our method for real world application.
منابع مشابه
Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملUnsupervised Feature Selection by Means of External Validity Indices
Feature selection for unsupervised data is a difficult task because a reference partition is not available to evaluate the relevance of the features. Recently, different proposals of methods for consensus clustering have used external validity indices to assess the agreement among partitions obtained by clustering algorithms with different parameter values. Theses indices are independent of the...
متن کاملIntrusion Detection based on a Novel Hybrid Learning Approach
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...
متن کامل